Basics of RStudio

RStudio is an Integrated Development Environment (IDE) for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser (Wikipedia description, impeccable).

For you who never ever saw it before

Downloading and Installing R and RStudio

For you to use it, first you need to download it here. As you can see, step number 1 = download and install R! Too many students install RStudio and then surprisingly something doesn’t work…

Opening RStudio for the first time

Once you have downloaded and installed both RStudio and R, you can start by opening RStudio. This is what it will look like (you may have a white background: you can change it by going to \(RStudio\rightarrow Preferences\rightarrow Appearence\)).

As you can see, the screen is split in 3 parts at the beginning:

  • LEFT SIDE: here you see the Console open, but you could also switch to the Terminal or to a window called Jobs. We focus here on the Console only, as the other two are not relevant to beginner users. More details below

  • TOP RIGHT SIDE: here you see a window called Environment (plus History and Connections). We will focus only on the Environment in this guide. More details below

  • BOTTOM RIGHT SIDE: again, multiple windows. Files allows you to navigate through your folders, to open files directly from RStudio; Help is almost self-explanatory, as it is the windows where help files are displayed (e.g. R introductory guide, descriptions of functions, etc.); Viewer displays different outputs depending on the task, for example tables in html format, if you ask R to produce one using certain specific instructions; the other two windows will be discussed below.

Console

Focus first on the left side of the RStudio page. As you can see the window that you see open is called Console. Make sure you have the most recent version or R installed, which you can read immediately on top of the console (in my case, R 4.0.5, the most recent to this day). Then, the Console is where you can type and execute single operations using R language.

What does that mean? R is a language. Learn to speak it, and you will be able to do amazing things with your data! By speaking it, it means that you need to learn how to give the computer certain instructions, and the computer in return will execute the instruction and produce an output.

For more formal definitions of programming languages you can search on Google, while for a more formal introduction to R, you can follow one of the first instructions you see in the console \(\implies\) type help.start() (then press Return), and you will see in the bottom-right corner the guide appearing in the Help window.

However this guide I am writing is intended to be more practical and quick. Therefore let’s see an example of instruction that wee can give to R (well, typing and executing help.start() was already an instruction).

Let’s ask R to perform a very complex mathematical operation.

What you see is:

  • The line where you type the instruction (2+2)

  • The output of the operation you asked the computer to perform (4)

This is in short what the Console is and does. However we won’t be using it very much. WHY NOT? In one word: reproducibility.

Imagine you have to perform the following tasks:

  • Download a dataset from a website

  • Clean the dataset (i.e. remove variables you don’t need, build new variables, etc.)

  • Produce some descriptive graph (i.e. an histogram of a variable, a scatter plot, etc.)

  • Analyse the data with some fancy method (i.e. poisson regression, OLS, etc.)

You may have to write between 200 and 300 instructions (or more… or less). What if you do something wrong at certain point and have to retrieve a specific instruction? You want to have a file where you can read at any moment what you did and how you did it.

R Scripts

When you start a project (e.g. solving a problem set, your thesis, etc.) you want to make sure that you record every step you make. R scripts are one tool that helps us precisely to achieve this goal.

This window which has just opened is where you can start typing again your instructions for the computer. Since ideally for a project you will write many lines of codes, it may be useful to take notes on the script of what you’re doing. To this purpose, you can easily write notes or any sort of comments as long as it is preceded by an hash. Anything else will be treated as an instruction instead (e.g. generate a function, execute one, open a dataset, perform an operation)!

As you can see from the example in the picture, if you type \(2+2\) (or any other operation), you don’t immediately see the output. That’s because writing a line of code doesn’t mean executing it as well. To execute a line of code you can either click Run or use the keyboard shortcut cmd+Return (for Mac; ctrl instead of cmd for Windows). I strongly suggest learning to use keyboard shortcuts.

Now it’s time to introduce the other windows in RStudio.

Environment

What is the Environment? It’s where you see all the objects you create. What does that mean? Let’s find out with practical examples.

Let’s “store” some operations as objects.

Three things can be noticed here:

  • Objects can be created by simply writing name of object = instruction for computer
    • Remark: you choose the name
    • Objects can be any acceptable instruction
      • Not acceptable instructions will return an error message in the Console

  • When you create objects and execute the line, the Console does not return any output
    • Indeed you are not asking R to show the output, rather just to store an operation

  • Finally something appears in the Environment! Here is where you see your stored objects
    • In the specific case, since we are creating values you already see the result of the operation

Let’s create now an object every econometrician should be familiar with: a dataframe. It’s not important not to understand how to create a dataset (normally you will import them from external sources), but I do it now for the sole purpose of showing you how it appears in the environment.

For you to know what happens below:

  • The dataset will have two variables
  • One variable (var1) will have as 3 observations the three objects store before
  • The other will have 3 random names as observations

What’s important here is how you see the dataset in the Environment:

  • The preview show already hoe many columns (variables) and rows (obs.) make the data
  • The small blue circle on the left side allows to see what type of variables are in the dataset
    • You can see the variable names, what type of variables they are (e.g. numeric, factor, string, etc.) and a preview of the values contained in them
  • The grid symbol allows you to browse your data: a new window will open next to the working script and you will be able to give a broader look at your dataset

Finally, before we move to the bottom-right corner, you can navigate through every command you executed since you opened RStudio in the History window. Remember: we use a script exactly so that we don’t have to use the History window.

Packages

When moving to the bottom-right corner, we see again different windows. Let’s highlight the Packages one.

What are packages?
Long story short: some instructions can be very complex to write \(\rightarrow\) packages make our life easier, as they allow us to give the computer complex instructions with just few lines of code.

Example: you want to produce a bar graph using your data. How would you tell RStudio to take the data, group them by categories, compute frequencies and represent everything on a .png output showing an \(x\) and \(y\) axis, etc.? A package for graphs will help us give this instruction in just one line!

As you can see, there are many packages displayed in the Packages window, sided with a short description. It means that these are already installed in RStudio. The ones which are already ticked, in addition, are also already loaded, therefore ready for use. Mind the difference: installing a package is not sufficient to start using it, you must also load it.

Installing a package

  • This is a once only operation \(\implies\) don’t type it in the script

    • It’s the only exception to writing everything in the script
  • Therefore: type it and execute it directly in the console

  • Syntax: install.packages(“name-of-package”)

Example: for convenience (I’ll tell you soon why “for convenience”), let’s install a package called pacman.

As you see, the console shows the downloading process, until eventually you read the message: “The downloaded binary packages are in bla bla”. This means it’s successfully installed.

Loading a package

  • This should always be in the script
    • Everytime you start a project, you may start by loading all the packages that you think you will need
    • Everytime you close RStudio, when you re-open it, the packages will already be installed, but will have to be loaded again
  • Syntax: library(name-of-package)

A useful trick

Everytime you want to install and load a package you have actually 2 options:

  • use install.packages(“package_name”) and library(package_name) as just shown

  • exploit a function in the package pacman (that’s why I made you install it!): just type p_load(package_name) in the script and execute

    • if the package is already installed, it will be simply loaded (so it’s like library())
    • if it’s not yet installed, it will be first installed, then loaded (so it combines install.packages() and library() at once). Quite convenient, huh?
    • just remember to execute library(pacman) at the very beginning, otherwise you can’t use p_load
    • you can install/load many packages at once! For example with two packages just type p_load(package1, package2). Attention: separated by a comma, always!

Let’s see how it works. I will now install and load a package which contains many datasets: wooldridge (to be precise, it contains 111 datasets from “Introductory econometrics” by Jeffrey Wooldridge, very useful to do some practice)

As you see, the package is installed and loaded at the same time! Let’s use one of these datasets to see what the last window, Plots, is for.

Plots

Our tour concludes with showing you the purpose of the window Plots. It’s actually quite intuitive to guess what it does. So let’s see it in practice.

We now:

  • import a dataset using package wooldridge
    • remember: we store it as an object, so it will appear in the Environment
  • produce a pair of graphs using this dataset
    • the plots will appear in the window Plots

Let’s start with the dataset. Don’t worry about the syntax, you don’t have to learn it now.

REMARK: notice that if you give an object the name of an existing one, the existing one is replaced by the new one. If you want to avoid overwriting existing objects choose different names for new objects.

So, now that we have our data, we can plot the histogram of a variable. Again, you don’t need to learn how to do this now. Let’s just see where the outcome is produced.

As you can see the plot is shown in the Plots window. Let’s make another one.

Now that we have made two graphs you can switch from one to the other just by pressing the arrows icons in the Plots window. Quite easy, right?

I say very few more things, then I you’ll be able to start getting your hands dirty with R and RStudio.

Wrapping code into sections

If the problem set you have to solve, or your MSc thesis require many lines of code and very different instructions, it may be useful for you to split your code into different sections so that you can navigate through your code more easily.

How to do it? The syntax is very simple: # this is a section —-. So, as usual, since you are writing text, begin with #. Then write the name of the section, followed by —- (or ####, or ====, it’s all the same).

As soon as you create a section, a small arrow icon appears next to the section name. If you click on it, you’ll see you can hide the content of the section. Finally, when you have many sections, it’s easy to jump from one to the other, as I show you below.

So, keep your code tidy, use sections!

A final remark on Error messages

Learning a new language is never easy. At the beginning it is likely to make many mistakes. As you make mistakes, the person to whom you are speaking may not understand what you mean. Eventually this may lead to big misunderstandings. For example, take the classic Italian going to Malta case.

With programming languages it’s the same. If you speak to the computer using the wrong words, the computer won’t understand and will return an error message.

It is therefore extremely important that whenever you use a function from any given package, you pay extra attention to use the correct syntax. To help you do so, every package provides extensive documentation. You can simply Google the package name and open the link with the R documentation. Examples of common mistakes:

  • install.packages requires " "
  • packages loaded with p_load must be written separated by ,
  • Text in the script must always have a #

If you don’t follow the rules, the Console will return an error message. The message is usually helpful to understand what went wrong. In my experience it’s trivial mistakes that trigger the errors, therefore if you see one, just re-read your code and see if there is a missing " or a missing ).

Now you are ready to start using RStudio for Econometrics!